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Abstract. The LISA International Science Team Working Group on Data Analysis (LIST- WG1B) 
is sponsoring several rounds of mock data challenges, with the purpose of fostering development 
of LISA data-analysis capabilities, and of demonstrating technical readiness for the maximum 
science exploitation of the LISA data. The first round of challenge data sets were released at this 
Symposium. We describe the models and conventions (for LISA and for gravitational-wave sources) 
used to prepare the data sets, the file format used to encode them, and the tools and resources 
available to support challenge participants. 



The objectives, structure, and timeline of the Mock LISA Data Challenges (MLDCs) 
are discussed in the other contribution in this volume by the MLDC Task Force. Here 
we concentrate on the technical side of the challenges, and in particular on the theo- 
retical models used to embody LISA and gravitational-wave (GW) sources in the first 
Challenge, and on the file format used to distribute the Challenge datasets. More details 
can be found on the official MLDC website [1], in the living Omnibus document for 
Challenge 1 [2], and on the MLDC Task Force wiki [3]. 



1. MODELING LISA 

The analysis of real LISA data will necessarily involve a detailed modeling of instrument 
response and noise; for the purposes of the MLDCs, it is desirable to decouple this aspect 
from the inherent complexity of GW analysis, by distilling the LISA measurements into 
a standard idealized model. Thus, the MLDC task force has developed a set of pseudo- 



LISA assumptions and conventions, which we lay out in this section (see also Ref. [2]). 
Both of the LISA response simulators used in Challenge 1 (the LISA Simulator [4] and 
Synthetic LISA [5]) comply with these assumptions, and adhere to these conventions. 
In later challenges, as the craft of LISA data-analysis matures, so will the pseudo-LISA 
model, becoming increasingly realistic. 

1.1. The pseudo-LISA orbits 

The pseudo-LISA orbits are obtained by truncating exact Keplerian orbits for a small 
mass orbiting the Sun to first order in the eccentricity (see the Appendix of Ref. [4]). 
In Solar-System Barycentric (SSB) coordinates (with the x axis aligned with the vernal 
point), we set 

x n = acosa + ae (sin a cos a sin /3„ — (1 +sin 2 a)cos/3„) , 

y n = asina + ae (sin a cos a cos /3„ — (1 + cos 2 a) sin/3„) , (1) 

z„ = -v / 3aecos(a -j8 M ) , 

where j8„ = (n — 1) x 2n/3 + X (n = 1, 2, 3) is the relative orbital phase of each space- 
craft, a = 1 AU is the semi-major axis of the guiding center, and a (t) = 2k t / ( lyear) + K 
is its orbital phase. In this approximation, the spacecraft form a rigid equilateral triangle 
with side length L = 2y/3ae = 5 x 10 6 km for e = 0.00965. (In fact, the LISA Simulator 
and Synthetic LISA implement e 2 -accurate orbits, but the additional terms make very 
little difference to the instrument response.) 

The parameters K and X (initialPosition and InitialRotation in lisaXML, 
see Sec. 3.2) set the initial location and orientation of the LISA constellation; in Chal- 
lenge 1, K = X = 0. This choice places LISA at the vernal point, with spacecraft 1 
directly below the guiding center in the southern ecliptic hemisphere. See Ref. [2] for 
expressions to convert to other LISA orbit specifications. 

All times are measured by an ideal clock at the SSB. 

1.2. The LISA response 

The basic (individual-link) LISA response to GWs is taken to be the phase response 
4> ;; used in the LISA Simulator and discussed in Sec. II of Ref. [4], or the fractional 
frequency response y^ r used in Synthetic LISA and discussed in Sec. II B of Ref. [5]. 
(See the TDI Rosetta Stone [6] for translations between index notations.) The phase 
and fractional-frequency formalisms are equivalent, and are related by a simple time 
integration. The former has the advantage of representing more closely the actual output 
of the LISA phasemeters; the latter of being directly proportional to (differences of) the 
gravitational strains at the spacecraft. (In fact the LISA Simulator produces equivalent- 
strain data, with a nominal length of L„ = 10 10 m. To convert equivalent strain to 
fractional frequency, differentiate and multiply by 2nL n /c.) 

LISA will employ Time-Delay Interferometry (TDI; see Refs. [7, 8, 9]) to cancel the 
otherwise overwhelming laser phase noise. In essence, TDI observables are constructed 



from time-delayed linear combinations of individual-link measurements, and they rep- 
resent synthesized interferometers where laser phase fluctuations move in closed paths 
across the LISA arms. More complicated paths are required to deal with the variations 
of the armlengths due to the finer details of the LISA orbits, giving rise to the three TDI 
"generations." 

It is expected that high-level LISA data-analysis tasks (such as those targeted in the 
challenges) will be performed directly on TDI observables, and not on the underlying 
phase measurements. Thus, for the initial challenges we elect to represent the LISA 
output as TDI 1.5 observables [8, 9], and in particular as the unequal-arm Michelson 
observables X, Y, and Z defined in Refs. [9]. Strictly speaking, TDI 2.0 would be 
required to cancel laser noise completely in a rotating and flexing LISA array such 
as our pseudo-LISA; however, the upgrade from TDI 1.5 to 2.0 changes little in the 
response to GW signals, but it requires more careful numerical treatments and adds to 
the complexity of analysis codes. Thus, the initial Challenge data sets contain TDI 1.5 
observables without laser noise. 

1.3. The pseudo-LISA noises 

The model of LISA instrument noise adopted in Challenge 1 includes only con- 
tributions from optical noise (assumed white in phase, with one-sided spectral den- 
sity sl^(f) = 20 x 10 _12 mHz -1 / 2 ), and from acceleration noise (assumed white in 
acceleration, but increasing as l/f below 10" 4 Hz, with one-sided spectral density 

Sllc(f) = 3 x 10 15 [1 + (10- 4 Hz/f) 2 ] l / 2 ms- 2 nz- 1 / 2 ), but not from laser phase noise, 
as discussed above. 

The six optical noises and six acceleration noises (for the two optical benches on each 
spacecraft) are modeled as independent Gaussian random processes, and are realized 
in practice with sequences of pseudo-random numbers. Specifically, Synthetic LISA 
generates independent Gaussian deviates (i.e., white noise) in the time domain, and then 
filters them digitally to obtain the desired spectral shape; the LISA Simulator generates 
independent Gaussian deviates in the frequency domain, multiplies them by S l ' 2 (f), and 
FFTs to the time domain. 



2. MODELING GW SOURCES 

Another source of complexity that we wish to exclude from the initial challenges is 
the uncertainty about the true shape of the gravitational waveforms that Nature will 
provide to LISA. However, we can already begin prepare for their detection and analysis, 
while waiting for theory to provide more and more accurate models, by working with 
fully known waveforms of comparable structure and increasing complexity. This section 
describes the standard simplified waveforms used in Challenge 1 to embody the signals 
emitted by the three kinds of GW sources under consideration: galactic binaries, massive 
black-hole binaries, and extreme-mass -ratio inspirals. A special care was devoted to 
choosing standard source parametrizations that could be used by MLDC participants to 



TABLE 1. Common source parameters. Note that in the initial challenges we do not 
deal explicitly with the redshifting of sources at cosmological distances; thus, D is a 
luminosity distance, and the masses and frequencies of Tables 3 are those measured at 
the SSB, which are red/blue-shifted by factors (1 +z) ±1 w.r.t. to those measured locally 
near the sources. 



Parameter 


Symbol 


Standard parameter name 


Standard unit 






(lisaXML descr.) 


(lisaXML descr.) 


Ecliptic latitude 




Ecliptic Latitude 


Radian 


Ecliptic longitude 


A 


EclipticLongitude 


Radian 


Polarization angle 


V 


Polarization 


Radian 


Inclination 


i 


Inclination 


Radian 


Luminosity distance 


D 


Distance 


Parsec 



report their analysis results and compare them easily. 

2.1. Conventions 

The sky location of a GW source is described by its J2000 ecliptic latitude j5 and 
longitude A, the latter measured from the vernal point, aligned with the x axis in our 
convention. We model gravitational radiation from the source as a plane wave traveling 
along the direction k = —(cos /3 cos A, cos /3 sin A, sin/3), with surfaces of constant phase 
given by t, = t — k-x. As written in the trans verse-traceless gauge, the gravitational 
strain tensor can be decomposed in two standard polarization states, 

h{$)=h + {$)[u®u-v®v}+h x {$)[u®v + v®u], (2) 

where h+(%) and h x {£,) multiply the polarization tensors e + and e x formed from 
u = dk/d[3, v oc dk/dX. Thus, GWs from any MLDC source are completely specified 
by /3, A, and by the two functions h+(%) and h x (%) for the source's GW polarization 
amplitudes, measured at the SSB. 

The orbital orientation of nonprecessing binaries is described by the inclination i (the 
angle between the line of sight — k and the orbital angular momentum of the binary), 
and by their polarization angle i//: specifically, if and h x (£) are the binary's GW 

polarizations in the source frame (i.e., defined with respect to the binary's principal 
polarization axes p and q) then 



h + (t;) + ih x (t;) = e- 2i v + 



(3) 



with \\f = — arctan(v- p/u-p). Together with /3, A, and with the luminosity distance D, 
l and y form a set of common standard parameters, listed in Tab. 1 with their standard 
lisaXML (see Sec. 3.2) descriptors and units. 

2.2. Galactic binaries 



Challenge 1 includes only searches for individually resolvable galactic binaries, as op- 
posed to quasi-stochastic signals from populations of unresolvable sources. As an added 



TABLE 2. GalacticBinary source parameters. Note that Amplitude 
effectively replaces the standard D i s t a n c e parameter. 



Parameter 


Symbol 


Standard parameter name 
(lisaXML descr.) 


Standard unit 
(lisaXML descr.) 


Amplitude 




Amplitude 


1 (GW strain) 


Frequency 


f 


Frequency 


Hertz 


Initial GW phase 


</>0 


InitialPhase 


Radian 



simplification, all binaries are taken to be circular and monochromatic. Consequently, a 
Challenge-1 GalacticBinary source is completely determined by the parameters of 
Tables 1 and 2 together. The source-frame polarization amplitudes are computed in the 
restricted post-Newtonian approximation, and they are given by 

h?+(£) = ^(l+cos 2 i)cos(2^ + o ), (4) 
h s x (£) = -2^(cosi)sin(2^ + 0o). 

The amplitude is specified explicitly among the source parameters; it is given in terms 
of the underlying physical parameters by = (2/i/D) (ftM/) 2 / 3 , with M = mi + m,2 the 
total mass, and /i = m\ +ni2 the reduced mass. 

2.3. Massive-black-hole binaries 

For the sake of simplicity, all the massive-black-hole binaries GW sources consid- 
ered in Challenge 1 are taken to be circular; black-hole spins are ignored, as are the 
final plunge and merger phases. In such spin-less, circular, adiabatic binary inspirals, 
the Taylor-expanded post-Newtonian equations for energy balance can be integrated an- 
alytically, yielding expressions [10, 11] for the orbital phase 4> as a function of the 
instantaneous orbital frequency CO, and for the time to coalescence t c — t as a function of 
CO. Truncating the two expressions to 2PN order, and inverting the second and substitut- 
ing (numerically) in the first, we write the restricted post-Newtonian waveform for the 
inspiral as 

= ^[M W (^)] 2 / 3 (l+cos 2 l )cos[2$(^)], (5) 

= -^[M W (^)] 2 / 3 (2cos l )sin[24»(^)]. 

We end the waveform when one of the following conditions is realized: i) the 
(Schwarzschild) last stable orbit is reached or ii) the "MECO" condition [12] is 
fulfilled or iii) co becomes negative. Such a termination engenders ringing in the Fourier 
domain. In reality this would not happen, because the inspiral waveform flows smoothly 
into the plunge and merger waveforms (which we do not model). Thus, we smooth out 
the waveform, beginning at an orbital separation 7? ta per € [7,9]M, by multiplying it by 
the ad hoc taper 

w(t) = (1+tanh [A(M/R-M/R taver )})/2, (6) 



TABLE 3. BlackHoleBinary source parameters. 


Parameter 


Symbol 


Standard parameter name 


Standard unit 






(lisaXML descr.) 


(lisaXML descr.) 


Mass of first BH 


m\ 


Massl 


SolarMass 


Mass of second BH 


ni2 


Mass2 


SolarMass 


Time of coalescence 


t c 


CoalescenceTime 


Second 


Angular orbital phase 


4> 


InitialAngularOrbitalPhase 


Radian 


at time t = 








Tapering radius 


R 


Taper Applied 


TotalMass 



where R is approximated with Kepler's law (R = M 1 / 3 co~ 2 / 3 ), and where the dimension- 
less coefficient A =150 was determined empirically to produce smooth damping. 

The lisaXML standard parameters of these Challenge-1 BlackHoleBinary sources 
are listed in Tables 1 and 3. 

2.4. Extreme-Mass-Ratio Inspirals 

The Extreme-Mass-Ratio Inspiral (EMRI) waveforms adopted in Challenge 1 are the 
Barack-Cutler "analytic kludge" waveforms [13], whereby orbits are instantaneously 
approximated as Newtonian ellipses (and gravitational radiation is given by the corre- 
sponding Peters-Matthews formula [14]), but perihelion direction, orbital plane, semi- 
major axis, and eccentricity evolve according to post-Newtonian equations. While these 
waveforms are not particularly accurate in the highly relativistic regime of interest for 
real EMRI searches, they do exhibit the main qualitative features of the true waveforms, 
and they are considerably simpler to generate. It is expected that any search strategy that 
works for them could be modified fairly easily to deal with the true general-relativistic 
waveforms, once these become available. 

The "analytic kludge" waveforms are too complex to describe in this restricted space, 
so we refer the reader to Refs. [13] and [2], and we content ourselves with presenting a 
complete table of Challenge-1 EMRI parameters in Tab. 4. 



3. ENCODING CHALLENGE DATA SETS 

All MLDC training and challenge data sets are distributed from the MLDC website 
[1] in a standard file format, developed by the MLDC Task Force with the goal of 
facilitating the use of the data sets on different computing platforms, and of enabling 
their identification, perusal, tracking, and archival. The MLDC file format is also used 
internally by the MLDC Task Force within the workflow that leads from the choice of 
random source parameters to the generation of TDI data sets. 



TABLE 4. EMRI source parameters. Note that EMRIs do not use the nonprecessing-binary inclina- 
tion i, and be aware of the collision between the symbols for the EMRI compact-object mass (/J,) and 
opening angle (X), the binary reduced mass (again fx), and the ecliptic longitude (again X). 



Parameter 


Symbol 


Standard parameter name 


Standard unit 






(lisaXML descr.) 


(lisaXML descr.) 


Mass of central BH 


M 


MassOf SMBH 


SolarMass 


Mass of compact object 


M 


MassOf Compact Ob ject 


SolarMass 


Central-BH spin 


\S\/M 2 


CoalescenceTime 


Second 


Central-BH spin orientation 




ir o _i_ a r /in cj± europin, 


KdCtld 11 


w.r.t. SSB frame 




AzimuthalAngleOf Spin 








Initial Azimuthal. . . 




Azimuthal orbital freq. at t = 


vo 


. . . Orbit alFrequency 


Hertz 


Azimuthal orbital phase at t = 


4> 


. . . OrbitalPhase 


Radian 


Eccentricity at t = 


<?o 


InitialEccentricity 


1 


Direction of pericenter at t = 


fo 


InitialTildeGamma 


Radian 


Direction of orbital angular 


OCo 


InitialAlphaAngle 


Radian 


momentum w.r.t. S at t = 








Opening angle 


A 


LambdaAngle 


Radian 



3.1. File structure 

Depending on its use, a MLDC file will contain one or more of the following sections: 

• A prolog including file metadata such as author, generation date, the name and ver- 
sion of the computer code used to create the data, and any other relevant comments. 

• A LISA data section describing the model of the LISA orbits used in the sim- 
ulations; for the initial challenges, this amounts to the InitialPosition and 
InitialRotation needed to fully specify the pseudo-LISA orbits of Sec. 1.1. 

• A noise data section describing the models of the LISA noises used in the simu- 
lations; for the initial challenges, this amounts to the power spectral densities and 
generation timesteps of the pseudo-random sequences described in Sec. 1.3. 

• A source data section describing the gravitational waveform(s) included in the 
simulation. The sources may be specified in terms of the standard source parameters 
defined in Sec. 2; otherwise, they may be represented as explicit time series of 
the h + and h x gravitational strains at the SSB, plus a minimal set of parameters 
including the source's sky position and the time offset, cadence, and length of the 
time series. 

• A TDI data section containing one or more time series of TDI observables, assem- 
bled from LISA's response to noises and/or sources, as described in Sec. 1 .2; for the 
initial challenges, the observables of choice are X, Y, and Z of TDI 1.5, plus (triv- 
ially) the SSB time. The standard names of these observables are Xp, Yp, and Zp 
for the equivalent-strain version of the data sets (generated with the LISA Simula- 
tor), and Xf , Yf , and Zf for the fractional-frequency-fluctuation version (generated 
with Synthetic LISA). 



Specify LISA 
orbits, noises 



LISA model file: 

• Prolog 

• LISA data 

• Noise data 



Source strain file: 

• Prolog 

• Source data (sampled) 



Compute gravitational 
strains at SSB 



Source parameter file: 

• Prolog 

• Source data (parametric) 



Generate noises, TDI 
responses (simulators) 



Draw source parameters 
randomly from a priori 
distribution 



Training file: 

• Prolog 

• LISA data 

• Noise data 

• Source data (param.) 

• TDI data 



Challenge file: 

• Prolog 

• LISA data 

• Noise data (no seeds) 

• TDI data 



The Mock LISA Data Challenge 
Workflow 



FIGURE 1. The MLDC workflow: the final data products (the training and challenge data sets) and 
the intermediate work data are all represented as MLDC files that include different combinations of the 
possible sections. 



Thus, training data sets will be represented by MLDC files including all types of sec- 
tions; on the other hand, challenge data sets (which must be "blind") will omit the 
source-data section. Different combinations of the sections appear in the intermediate 
files used in the MLDC workflow, shown in Fig. 1. 

3.2. Implementation 

The MLDC file format is implemented using XML (the extensible Markup Lan- 
guage), a simple, flexible text format related to HTML, and widely used in the exchange, 
manipulation, and storage of many kinds of data, especially on the world-wide web 
[15]. Software libraries to handle XML are readily available for most computer lan- 
guages. XML documents consist of a nested hierarchy of elements, enclosed by opening 
and closing tags (<tagname> and </tagname>), and containing textual data; elements 
may also have textual attributes (as in <tagname attrname="att rvalue ">). White 
space and newlines have no meaning within XML (any sequence of such characters is 
in fact equivalent to a single white space), but they are usually added liberally to help 
human parsing of the files. 

The XML implementation of the MLDC file format (known as lisaXML) is based on 
XSIL (the extensible Scientific Interchange Language), an XML dialect developed at 
Caltech to represent scientific data in multiple applications [16]. XSIL is very terse; its 
data structures consist of few simple building elements: 

• <xsil Name=". . . " Type=". . . "> acts as a hierarchical container. For in- 
stance, the LISA-data section of an MLDC file is represented by an <xsil 
Type="LlSAData"> element, and the source-data section is represented by 
a <xsil Type="SourceData"> element containing one or more <xsil 
Name=" ..." Type="PlaneWave"> elements. 



• <Param Name=" ..." Unit=" . . . "> is used to describe parameter values and 
their units. For GW sources, the Name and Unit attributes are those found in Tables 
1-4. For instance, GalacticBinary may have <Param Name="Frequency " 
Unit="Hertz">l . 0e-3</Param>. 

• <Array Name=" ..." Type=" . . . "> is used to specify arrays of homogeneous 
data, such as the time series of TDI observables. In lisaXML, the actual array ele- 
ments are stored in separate binary files, but the Array element contains informa- 
tion about the location, organization, and encoding of the binary files. 

The advantage of this hybrid XML/binary arrangement is that bulk data are represented 
in binary format, which can be saved, stored, and loaded very efficiently, while file 
metadata and the LISA, noise, and source parameters are contained in textual XML files 
that are easily parseable and editable by humans (or by powerful XML libraries). In fact, 
lisaXML files can be viewed in standard-compliant web browsers such as Firefox and 
Safari, which will use a set of special lisaXML stylesheet transformations [17] to render 
the metadata and nested parameters structures (but not the bulk data!) as pleasantly 
formatted tables. 

3.3. Usage 

The MLDC Task Force has developed dedicated software tools to read and write 
lisaXML files from different computing environments: 

• C/C++. The LISA Tools Subversion archive [18] includes a directory 
lisaXML/ io-C, which contains lisaXML input-output routines in C; TDI 
and strain time series are returned as simple C arrays, while source and time series 
parameters are stored in C structures. 

• Python. Routines to input and output lisaXML are built into the Python- steered 
Synthetic LISA [5], which can read (write) strain and TDI time series into (from) 
standard NumPy arrays [19]. In addition, the Synthetic LISA Python/C++ objects 
that describe the LISA geometry, the LISA noises, and GW sources can also be 
translated to and from their lisaXML representation. 

• MATLAB. The program xml2matlab . c, found in directory lisaXML/C-exam- 
ples in the LISA Tools Subversion archive [18], can be compiled into a MATLAB 
MEX function that will read lisaXML TDI time series into a MATLAB array. 

• ASCII. The program xml2ascii . c, in the same location, can be compiled into a 
command-line utility that will dump lisaXML TDI time series into a tab-separated 
ASCII file, or to standard output. 

All these tools are still in development at the time of writing, but all are already quite 
functional in reading MLDC data sets into analysis applications. For more detailed 
instructions on reading lisaXML files, see the MLDC omnibus documents [2] and the 
LISA Tools SourceForge website [18]; these include also the instructions and scripts 
necessary to reproduce the MLDC workflow and create additional training sets. 
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